Alberto Cano, Ph.D.

Associate Professor

  • Richmond VA UNITED STATES
  • Engineering Research Building Room 2314
acano@vcu.edu

Dr. Cano specializes in machine learning, data mining, classification, big data, data streams, and high-performance computing.

Contact

Social

Biography

Alberto Cano is an Associate Professor with the Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, United States, where he heads the High-Performance Data Mining laboratory. His research is focused on machine learning, big data, data streams, concept drift, continual learning, GPUs and distributed computing. He is also the Faculty Director of the High Performance Research Computing Core Facility at VCU: https://hprc.vcu.edu/

Areas of Expertise

Machine Learning
Data Mining
Classification
Big Data
High Performance Computing

Accomplishments

Top 2% of most cited researchers in AI field by Stanford University ranking

2022-12-01

Stanford University Scientist Rankings

Amazon Machine Learning Award

2018-08-01

Hate Speech Detection on Amazon Reviews using Data Stream Mining on Spark and AWS

Education

University of Granada, Spain

Ph.D.

Computer Science

2014

University of Cordoba, Spain

M.Sc.

Intelligent Systems

2013

University of Granada, Spain

M.Sc.

Soft Computing and Intelligent Systems

2011

Show All +

Research Grants

MRI: Track 1 Acquisition of NVIDIA DGX H100 GPU system for research and education at VCU

National Science Foundation

2023-09-06

NSF MRI

View more

SentimentVoice: Integrating emotion AI and VR in Performing Arts

Commonwealth Cyber Initiative

2023-06-01

Integrating emotion AI and VR in Performing Arts

HPRC research computing clusters

State Council of Higher Education for Virginia

2022-12-01

HPRC research computing clusters

Show All +

Courses

CMSC 508 - Databases

Database Theory

CMSC 603 - High Performance Distributed Systems

High Performance Distributed Systems

Selected Articles

A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework

Machine Learning

G. Aguiar, B. Krawczyk, and A. Cano

2023-06-01

Class imbalance poses new challenges when it comes to classifying data streams. Many algorithms recently proposed in the literature tackle this problem using a variety of data-level, algorithm-level, and ensemble approaches. However, there is a lack of standardized and agreed-upon procedures and benchmarks on how to evaluate these algorithms. This work proposes a standardized, exhaustive, and comprehensive experimental framework to evaluate algorithms in a collection of diverse and challenging imbalanced data stream scenarios. The experimental study evaluates 24 state-of-the-art data streams algorithms on 515 imbalanced data streams that combine static and dynamic class imbalance ratios, instance-level difficulties, concept drift, real-world and semi-synthetic datasets in binary and multi-class scenarios. This leads to a large-scale experimental study comparing state-of-the-art classifiers in the data stream mining domain. We discuss the advantages and disadvantages of state-of-the-art classifiers in each of these scenarios and we provide general recommendations to end-users for selecting the best algorithms for imbalanced data streams. Additionally, we formulate open challenges and future directions for this domain. Our experimental framework is fully reproducible and easy to extend with new methods. This way, we propose a standardized approach to conducting experiments in imbalanced data streams that can be used by other researchers to create complete, trustworthy, and fair evaluation of newly proposed methods. Our experimental framework can be downloaded from https://github.com/canoalberto/imbalanced-streams.

View more

ROSE: Robust Online Self-Adjusting Ensemble for Continual Learning on Imbalanced Drifting Data Streams

Machine Learning

A. Cano and B. Krawczyk

2022-11-01

Data streams are potentially unbounded sequences of instances arriving over time to a classifier. Designing algorithms that are capable of dealing with massive, rapidly arriving information is one of the most dynamically developing areas of machine learning. Such learners must be able to deal with a phenomenon known as concept drift, where the data stream may be subject to various changes in its characteristics over time. Furthermore, distributions of classes may evolve over time, leading to a highly difficult non-stationary class imbalance. In this work we introduce Robust Online Self-Adjusting Ensemble (ROSE), a novel online ensemble classifier capable of dealing with all of the mentioned challenges. The main features of ROSE are: (1) online training of base classifiers on variable size random subsets of features; (2) online detection of concept drift and creation of a background ensemble for faster adaptation to changes; (3) sliding window per class to create skew-insensitive classifiers regardless of the current imbalance ratio; and (4) self-adjusting bagging to enhance the exposure of difficult instances from minority classes. The interplay among these features leads to an improved performance in various data stream mining benchmarks. An extensive experimental study comparing with 30 ensemble classifiers shows that ROSE is a robust and well-rounded classifier for drifting imbalanced data streams, especially under the presence of noise and class imbalance drift, while maintaining competitive time complexity and memory consumption. Results are supported by a thorough non-parametric statistical analysis.

View more

Kappa Updated Ensemble for Drifting Data Stream Mining

Machine Learning

A. Cano and B. Krawczyk

2019-08-30

Learning from data streams in the presence of concept drift is among the biggest challenges of contemporary machine learning. Algorithms designed for such scenarios must take into an account the potentially unbounded size of data, its constantly changing nature, and the requirement for real-time processing. Ensemble approaches for data stream mining have gained significant popularity, due to their high predictive capabilities and effective mechanisms for alleviating concept drift. In this paper, we propose a new ensemble method named Kappa Updated Ensemble (KUE). It is a combination of online and block-based ensemble approaches that uses Kappa statistic for dynamic weighting and selection of base classifiers. In order to achieve a higher diversity among base learners, each of them is trained using a different subset of features and updated with new instances with given probability following a Poisson distribution. Furthermore, we update the ensemble with new classifiers only when they contribute positively to the improvement of the quality of the ensemble. Finally, each base classifier in KUE is capable of abstaining itself for taking a part in voting, thus increasing the overall robustness of KUE. An extensive experimental study shows that KUE is capable of outperforming state-of-the-art ensembles on standard and imbalanced drifting data streams while having a low computational complexity. Moreover, we analyze the use of Kappa vs accuracy to drive the criterion to select and update the classifiers, the contribution of the abstaining mechanism, the contribution of the diversification of classifiers, and the contribution of the hybrid architecture to update the classifiers in an online manner.

View more

Show All +